skip to main content


Search for: All records

Creators/Authors contains: "Vaidya, Jaideep"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Nikolski, Macha (Ed.)
    Abstract Motivation

    Genome-wide association studies (GWAS) benefit from the increasing availability of genomic data and cross-institution collaborations. However, sharing data across institutional boundaries jeopardizes medical data confidentiality and patient privacy. While modern cryptographic techniques provide formal secure guarantees, the substantial communication and computational overheads hinder the practical application of large-scale collaborative GWAS.

    Results

    This work introduces an efficient framework for conducting collaborative GWAS on distributed datasets, maintaining data privacy without compromising the accuracy of the results. We propose a novel two-step strategy aimed at reducing communication and computational overheads, and we employ iterative and sampling techniques to ensure accurate results. We instantiate our approach using logistic regression, a commonly used statistical method for identifying associations between genetic markers and the phenotype of interest. We evaluate our proposed methods using two real genomic datasets and demonstrate their robustness in the presence of between-study heterogeneity and skewed phenotype distributions using a variety of experimental settings. The empirical results show the efficiency and applicability of the proposed method and the promise for its application for large-scale collaborative GWAS.

    Availability and implementation

    The source code and data are available at https://github.com/amioamo/TDS.

     
    more » « less
    Free, publicly-accessible full text available October 1, 2024
  2. Abstract

    The rapid improvements in genomic sequencing technology have led to the proliferation of locally collected genomic datasets. Given the sensitivity of genomic data, it is crucial to conduct collaborative studies while preserving the privacy of the individuals. However, before starting any collaborative research effort, the quality of the data needs to be assessed. One of the essential steps of the quality control process is population stratification: identifying the presence of genetic difference in individuals due to subpopulations. One of the common methods used to group genomes of individuals based on ancestry is principal component analysis (PCA). In this article, we propose a privacy-preserving framework which utilizes PCA to assign individuals to populations across multiple collaborators as part of the population stratification step. In our proposed client-server-based scheme, we initially let the server train a global PCA model on a publicly available genomic dataset which contains individuals from multiple populations. The global PCA model is later used to reduce the dimensionality of the local data by each collaborator (client). After adding noise to achieve local differential privacy (LDP), the collaborators send metadata (in the form of their local PCA outputs) about their research datasets to the server, which then aligns the local PCA results to identify the genetic differences among collaborators’ datasets. Our results on real genomic data show that the proposed framework can perform population stratification analysis with high accuracy while preserving the privacy of the research participants.

     
    more » « less
  3. Symptoms-tracking applications allow crowdsensing of health and location related data from individuals to track the spread and outbreaks of infectious diseases. During the COVID-19 pandemic, for the first time in history, these apps were widely adopted across the world to combat the pandemic. However, due to the sensitive nature of the data collected by these apps, serious privacy concerns were raised and apps were critiqued for their insufficient privacy safeguards. The Covid Nearby project was launched to develop a privacy-focused symptoms-tracking app and to understand the privacy preferences of users in health emergencies. In this work, we draw on the insights from the Covid Nearby users' data, and present an analysis of the significantly varying trends in users' privacy preferences with respect to demographics, attitude towards information sharing, and health concerns, e.g. after being possibly exposed to COVID-19. These results and insights can inform health informatics researchers and policy designers in developing more socially acceptable health apps in the future. 
    more » « less
  4. Open data sets that contain personal information are susceptible to adversarial attacks even when anonymized. By performing low-cost joins on multiple datasets with shared attributes, malicious users of open data portals might get access to information that violates individuals’ privacy. However, open data sets are primarily published using a release-and-forget model, whereby data owners and custodians have little to no cognizance of these privacy risks. We address this critical gap by developing a visual analytic solution that enables data defenders to gain awareness about the disclosure risks in local, joinable data neighborhoods. The solution is derived through a design study with data privacy researchers, where we initially play the role of a red team and engage in an ethical data hacking exercise based on privacy attack scenarios. We use this problem and domain characterization to develop a set of visual analytic interventions as a defense mechanism and realize them in PRIVEE, a visual risk inspection workflow that acts as a proactive monitor for data defenders. PRIVEE uses a combination of risk scores and associated interactive visualizations to let data defenders explore vulnerable joins and interpret risks at multiple levels of data granularity. We demonstrate how PRIVEE can help emulate the attack strategies and diagnose disclosure risks through two case studies with data privacy experts. 
    more » « less
  5. Providing provenance in scientific workflows is essential for reproducibility and auditability purposes. In this work, we propose a framework that verifies the correctness of the aggregate statistics obtained as a result of a genome-wide association study (GWAS) conducted by a researcher while protecting individuals’ privacy in the researcher’s dataset. In GWAS, the goal of the researcher is to identify highly associated point mutations (variants) with a given phenotype. The researcher publishes the workflow of the conducted study, its output, and associated metadata. They keep the research dataset private while providing, as part of the metadata, a partial noisy dataset (that achieves local differential privacy). To check the correctness of the workflow output, a verifier makes use of the workflow, its metadata, and results of another GWAS (conducted using publicly available datasets) to distinguish between correct statistics and incorrect ones. For evaluation, we use real genomic data and show that the correctness of the workflow output can be verified with high accuracy even when the aggregate statistics of a small number of variants are provided. We also quantify the privacy leakage due to the provided workflow and its associated metadata and show that the additional privacy risk due to the provided metadata does not increase the existing privacy risk due to sharing of the research results. Thus, our results show that the workflow output (i.e., research results) can be verified with high confidence in a privacy-preserving way. We believe that this work will be a valuable step towards providing provenance in a privacy-preserving way while providing guarantees to the users about the correctness of the results. 
    more » « less
  6. Multiple symptom tracking applications (apps) were created during the early phase of the COVID-19 pandemic. While they provided crowdsourced information about the state of the pandemic in a scalable manner, they also posed significant privacy risks for individuals. The present study investigates the interplay between individual privacy attitudes and the adoption of symptom tracking apps. Using the communication privacy theory as a framework, it studies how users’ privacy attitudes changed during the public health emergency compared to the pre-COVID times. Based on focus-group interviews (N = 21), this paper reports significant changes in users’ privacy attitudes toward such apps. Research participants shared various reasons for both increased acceptability (e.g., disease uncertainty, public good) and decreased acceptability (e.g., reduced utility due to changed lifestyle) during COVID. The results of this study can assist health informatics researchers and policy designers in creating more socially acceptable health apps in the future. 
    more » « less